智能论文笔记

Probabilistic Dalek -- Emulator framework with probabilistic prediction for supernova tomography

Wolfgang Kerzendorf , Nutan Chen , Jack O'Brien , Johannes Buchner , Patrick van der Smagt

分类：机器学习

2022-09-20

超新星光谱时间序列可用于重建称为超新星断层扫描的空间分辨爆炸模型。除了观察到的光谱时间序列外，超新星断层扫描还需要一个辐射转移模型来执行重建不确定性定量的反问题。超新星断层扫描模型的最小参数化大约是十二个参数，其现实是需要超过100的参数。现实的辐射转移模型需要数十分钟的CPU分钟来进行一次评估，从而使问题在计算上具有传统手段，需要数百万的MCMC样本才能获得此类MCMC样本。问题。一种使用机器学习技术加速称为替代模型或模拟器的新方法为这些问题提供了一种解决方案，以及一种了解光谱时间序列中的祖/爆炸的方法。 Tardis Supernova辐射传输代码存在模拟器，但它们仅在简单的低维模型（大约十二个参数）上表现良好，并且在Supernova字段中具有少量的知识增长应用程序。在这项工作中，我们为辐射转移代码TARDIS提出了一个新的模拟器，该模拟器不仅胜过现有的模拟器，而且还提供了预测中的不确定性。它为未来的基于学习的机械提供了基础，该机械将能够模拟数百个参数的非常高的维度空间，这对于在超新星和相关领域中阐明紧急问题至关重要。

translated by 谷歌翻译

Local distance preserving auto-encoders using Continuous k-Nearest Neighbours graphs

Nutan Chen , Patrick van der Smagt , Botond Cseke

分类：机器学习

2022-06-13

保留数据中相似性的自动编码器模型是表示学习中的流行工具。在本文中，我们介绍了几种自动编码器模型，这些模型在从数据空间到潜在空间的映射时可以保留本地距离。我们使用局部距离保留损失，该损失基于连续的K-Nearthiend邻居图，该图已知可以同时捕获所有尺度的拓扑特征。为了提高培训绩效，我们将学习作为约束优化问题，并保存本地距离，作为主要目标和重建精度作为约束。我们将这种方法推广到分层变分自动编码器，从而学习具有几何一致的潜在和数据空间的生成模型。我们的方法在几个标准数据集和评估指标上提供了最先进的性能。

translated by 谷歌翻译

Flat Latent Manifolds for Human-machine Co-creation of Music

Nutan Chen , Djalel Benbouzid , Francesco Ferroni , Mathis Nitschke , Luciano Pinna , Patrick van der Smagt

分类：机器学习

2022-02-23

在艺术音乐生成中使用机器学习会引起人们对艺术质量的有争议的讨论，而客观量化是荒谬的。因此，我们将音乐生成的算法视为与人类音乐家的对手，在这种环境中，相互互动的相互作用是为音乐家和观众带来新的体验。为了获得这种行为，我们求助于经常性变异自动编码器（VAE）的框架，并学会产生由人类音乐家种植的音乐。在学习的模型中，我们通过在潜在空间中插值生成新颖的音乐序列。但是，标准VAE不能保证其潜在表示中的任何形式的平滑度。这转化为生成的音乐序列的突然变化。为了克服这些局限性，我们将解码器的正规化并赋予潜在空间，并具有平坦的riemannian歧管，即是欧几里得空间等均衡的歧管。结果，在潜在空间中线性插值会产生逼真而平稳的音乐变化，适合我们目标的机器 - 音乐互动。我们通过音乐数据集上的一组实验为我们的方法提供了经验证据，并为与专业鼓手的交互式jam会话部署了模型。现场表演提供了定性的证据，表明鼓手可以直观地解释和利用潜在的代表来推动相互作用。除了音乐应用之外，我们的方法还展示了由可解释性和与最终用户的互动驱动的机器学习模型设计的实例。

translated by 谷歌翻译

Generative appearance replay for continual unsupervised domain adaptation

Boqi Chen , Kevin Thandiackal , Pushpak Pati , Orcun Goksel

分类：计算机视觉 | 人工智能

2023-01-03

Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.

translated by 谷歌翻译

MGTAB: A Multi-Relational Graph-Based Twitter Account Detection Benchmark

Shuhao Shi , Kai Qiao , Jian Chen , Shuai Yang , Jie Yang , Baojie Song , Linyuan Wang , Bin Yan

分类：计算机视觉

2023-01-03

The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.

translated by 谷歌翻译

Explaining Imitation Learning through Frames

Boyuan Zheng , Jianlong Zhou , Chunjie Liu , Yiqiao Li , Fang Chen

分类：机器学习 | 计算机视觉

2023-01-03

As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.

translated by 谷歌翻译

Saliency-Aware Spatio-Temporal Artifact Detection for Compressed Video Quality Assessment

Liqun Lin , Yang Zheng , Weiling Chen , Chengdong Lan , Tiesong Zhao

分类：计算机视觉

2023-01-03

Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.

translated by 谷歌翻译

Risk-Averse MDPs under Reward Ambiguity

Haolin Ruan , Zhi Chen , Chin Pang Ho

分类：机器学习

2023-01-03

We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.

translated by 谷歌翻译

Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling

Penghao Wu , Li Chen , Hongyang Li , Xiaosong Jia , Junchi Yan , Yu Qiao

分类：计算机视觉

2023-01-03

Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.

translated by 谷歌翻译

More is Better: A Database for Spontaneous Micro-Expression with High Frame Rates

Sirui Zhao , Huaying Tang , Xinglong Mao , Shifeng Liu , Hanqing Tao , Hao Wang , Tong Xu , Enhong Chen

分类：计算机视觉

2023-01-03

As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.

translated by 谷歌翻译